perm filename HAW.DOC[AM,DBL]1 blob
sn#259137 filedate 1977-01-24 generic text, type T, neo UTF8
Here is my paper for the Rule-based Inference conference. The format
is close to -- but not absolutely identical to -- the one recommended
in Waterman and Hayes-Roth's memo.
To denote ITALICS, we use the notation &i[I am italicised!!].
To denote underlining, we say &u[I'm underlined.].
That is, ampersand, followed by "i" or "u", followed by the text in
square brackets. Please note that this paper appears very long
because it is in fixed-width font; the XGP version is of course much
more compact. The name of this file is HAW.DOC[c380dl22], on
CMU-10a.
I am in the process of finalizing the design of the new discovery
program (the successor to AM), so any feedback which you have to the
paper would be very welcome.
-- Doug Lenat
&i[Designing a Rule System That
Searches for Scientific Discoveries]
&i[Douglas B. Lenat
and
Gregory Harris]
&i[Computer Science Department
Carnegie-Mellon University
Pittsburgh, Pa. 15213]
DRAFT (&i[January 15, 1977])
This work was supported in part by the Defense Advanced Research
Projects Agency (F44620-73-C-0074) and monitored by the Air Force
Office of Scientific Research.
Lenat & Harris p. i
&i[1. The Basic Argument] 1
&i[2. Early Design Constraints] 3
&i[3. &u[`AM']: A Rule System For Math Theory Formation] 5
&i[3.1.] Discovery in Mathematics as Heuristic Rule-Guided Search 6
&i[3.2.] Representation of Mathematical Knowledge 7
&i[3.3.] Top-level Control: An Agenda of Promising Questions 8
&i[3.4.] Low-level Control: A Lattice of Heuristic Rules 11
&i[3.5.] Behavior of this Rule System 13
&i[4. Reexamining the Design] 17
&i[4.1.] Data Structures 17
&i[4.2.] Rules 22
&i[4.3.] Distribution of Knowledge Between Rules and DS 27
&i[4.4.] Interpreter 30
&i[5. Speculations for a New Discovery System] 31
&i[5.1.] A New Set of Design Constraints 32
&i[ABSTRACT]
Some scientific inference tasks (including mass spectrum
identification [Dendral], medical diagnosis [Mycin], and math
theory development [AM]) have been sucessfully modelled as rule-
directed search processes. These rule systems are designed quite
differently from "pure production systems". By concentrating upon
the design of one program (AM), we shall show how 12 kinds of
design deviations arise from (&i[i]) the level of sophistication of
the task which the system is designed to perform, (&i[ii]) the
inherent nature of the task, and (&i[iii]) the designer's &i[view]
of the task. The limitations of AM suggest even more radical
departures from traditional rule system architecture. All these
modifications are then collected into a new, complicated set of
constraints on the form of the data structures, the rules, the
interpreter, and the distribution of knowledge between rules and
data structures. These new policies sacrifice uniformity in the
interests of clarity, efficiency and power derivable from a
thorough characterization of the task. Rule systems whose
architectures conform to the new design principles will be more
awkward for many tasks than would "pure" systems. Nevertheless,
the new architecture should be significantly more powerful and
natural for building rule systems that do scientific discovery
tasks.
Lenat & Harris p. 1
1. The Basic Argument
Although rule-based computation was originally used for formal and systems
purposes [Post,Markov,Floyd], researchers in Artificial Intelligence (AI) found
that the same methodology was also useful for modelling a wide variety of
sophisticated tasks. Many of these early AI rule-based programs -- called
"production systems" -- served as information processing models of humans
performing cognitive tasks (digit recall [19], algebra word problem solving [1],
poker playing [23], etc. [16,18]).
There were many design constraints present in the original formal rule based
systems. Many of these details were preserved in the AI production rule based
programs (e.g., forcing all state information into a single string of tokens).
But there were many changes. The whole notion of "what a rule system really is"
changed from an effective problem statement to a tendency to solve problems in a
particular way. One typical corollary of this change of view was that instead
of &i[no] external inputs whatsoever, there was now a &i[presumption] of some
"environment" which supplied new entries into the token sequence. In the next
section, in the box, is an articulation of these traditional (i.e., early) AI
principles for designing "pure" [7] production systems.
Due to the early successes, psychological applicability, and aesthetic
simplicity afforded by production systems, AI researchers began to write rule
systems (RSs) to perform informal inductive inference tasks (mass spectrum
identification [4], medical diagnosis [23] and consultation dialogue [6], speech
understanding [14], non-resolution theorem proving [0], math research [13], and
many more).
Lenat & Harris p. 2
Yet it seems that most of the large, successful RSs have violated many of the
"pure production system" guidelines. The purpose of this paper is to show that
such "exceptions" were inevitable, because the early design constraints are
[1]
naive (with respect to their use) .
The essence of the naive architecture is to opt for simplicity in all things,
since there is very little one can say about RSs in general. As more becomes
known about the task of the RS, it turns out that some of that new knowledge
takes the form of specific constraints on the design of the RS itself (as
distinct from what specific knowledge we choose to represent within that
design). Sometimes a new constraint directly contradicts the early, domain-
independent one; sometimes it is merely a softening or augmentation of the old
constraint.
After examining the "pure" architecture, we shall examine in detail the design
of one particular rule system which discovers and studies mathematical concepts.
Deviations from the pure architecture will be both frequent and extreme.
Subsequent sections will analyze these differences. It will be shown that each
one is plausible -- usually for reasons which depend strongly on the "scientific
discovery" domain of the RS. Some of the limitations of this RS will be treated,
and their elimination will be seen to require abandoning still more of the
original design constraints.
---------------
1
That is, each statement is &u[not] naive with respect to its invention, nor
its initial uses. Rather, its limitations have only recently emerged, when
trying to apply it to more complex tasks.
Lenat & Harris p. 3
When these modifications are collected, in the final section, we shall have a
quite different set of principles for building RSs. Not only will naivete have
been lost: so will generality (the breadth of kinds of knowledge representable,
the totality of tractable tasks). Rule systems conforming to the new design
will be awkward for many tasks. However, they should be significantly more
powerful and natural for scientific inference tasks.
2. Early Design Constraints
By a &i[rule system] (RS) we shall mean any collection of condition-action
&i[rules], together with associated &i[data structures] (DS; also called
&i[memories]) which the rules may inspect and alter. There must also be a
policy for &i[interpretation]: detecting and firing relevant rules.
These definitions are deliberately left vague. Many details must be specified
for any actual rule system (e.g., What may appear in the condition part of a
rule?). This specification process is what we mean by &i[designing] a RS.
In the box below is an articulation of the design of the early general-purpose
AI production rule systems. Notice the common theme: the adequacy of simplicity
in all dimensions.
FIGURE 1: Early Rule System Architecture
[1. ]Principle of Simple Memories. One or two uniform data
structures define sufficient memories for a rule system to read
from and write into. The format for entries in these structures is
both uncomplicated and unchanging.
[2. ]Principle of Simple DS Accesses. The primitive read and
write operations are as simple and low-level as possible;
typically they are simply a membership-test type of read, and an
Lenat & Harris p. 4
insert-new-element type of write. More complicated, algorithmic
operations on the memories are not available to the rules.
[3. ]Principle of Isolated DS Elements. Elements of the uniform
DS cannot point to (parts of) other elements. This is the dual of
the preceding principle: if we aren't allowed to &u[chase]
pointers, there may as well not be any.
[4. ]Principle of Continuous Attention. In addition to the one or
two simple data structures, there may be an external environment
which continuously inserts stimuli into the DS. The interleaving
of stimuli and internally generated symbols is managed quite
trivially: (a) The stimuli are simply inserted into the DS as new
elements; (b) Each rule is so small and quick that no
"interruption" mechanism is necessary. The interpreter may ignore
any suddenly-added stimulus until the current rule finishes
executing. The RS may be viewed as "continuously" attending to
the environment.
[5. ]Principle of Opaque Rules. Rules need not have a format
inspectable by other rules, but rather can be coded in whatever
way is convenient for the programmer and the rule interpreter;
i.e., the set of rules is &u[not] treated as one of the RSs data
structures. E.g., the condition parts of rules may be barred from
fully analyzing the set of productions [22], and the action parts
of rules may not be allowed to delete existing rules [24].
[6. ]Principle of Simple Rules. Rules consist of a left- and a
right-hand side which are quite elementary: The left hand side
(lhs, situation characterization, IF-part, condition) is typically
a pattern-match composed with a primitive DS read access, and the
right hand side (rhs, consequence, THEN-part, action) is also
simply a primitive DS write access. There is no need for
sophisticated bundles of DS accesses on either side of a rule.
Thus several extra rules should be preferred to a single rule with
several actions.
[7. ]Principle of Encoding by Coupled Rules. A collection of
interrelated rules is used to accomplish each subtask; i.e.,
wherever a subroutine would be used in a procedural programming
language. For example, programming an iteration may require many
rules "coupled" by writing and reading special (i.e., otherwise
meaningless) loop control notes in the data structure.
[8. ]Principle of Knowledge as Rules. All knowledge of substance
should be, can be, and is represented as rules. This includes all
non-trivial domain-dependent information. The role of the DS is
just to hold simple descriptive information, intermediate control
state messages, recent stimuli from the environment, etc.
Lenat & Harris p. 5
[9. ]Principle of Simple Interpretation. The topmost control flow
in the RS is via a simple rule interpreter. After a rule fires,
it is essential that any rule in the system may potentially be the
next one to fire (i.e., it is forbidden to locate a set of
relevant rules and fire them off in sequence).
[10. ]Principle of Closure. The representations allowed by (1-8)
are sufficient and appropriate for organizing all the kinds of
knowledge needed for tasks for which RSs are designed.
This design was plausible &i[a priori], and worked quite well for its initial
applications (the simulation of simple human cognitive processes [16,19,24]).
But is this design proper for &i[any] RS, regardless of its intended task? In
particular, what about scientific inference tasks? Over the years, several
rule-based inference systems for scientific tasks have been constructed. With
each new success have come some deviations from the above principles [7]. Were
these mere aberrations, or is there some valid reason for such changes in
design?
We claim the latter. The task domain -- scientific discovery -- dictates a new
and quite different architecture for RSs. To study this phenomenon, we shall
describe, in the next section, one particular RS which defines new mathematical
concepts, studies them, and conjectures relationships between them. Subsequent
sections will explore the deviations of its design from the "pure" constraints
in the box above.
3. &u[`AM']: A Rule System For Math Theory Formation
A recent thesis [13] describes a program, called "AM", which gradually expands a
base of mathematical knowledge. The representation of math facts is somewhat
Lenat & Harris p. 6
related to Actors [10] and Beings [12] in the partitioning of such domain
knowledge into effective, structured modules. Departing from the traditional
control structures usually associated with Actors, Beings, and Frames [15], AM
concentrates on one "interesting" mini-research question after another. These
"jobs" are proposed by -- and rated by -- a collection of around 250 situation-
action rules. Discovery in mathematics is modelled in AM as a rule-guided
exploration process. This view is explained below in Section 3.1 (See also,
[21].) The representation of knowledge is sketched next, followed by a much more
detailed description of the rule-based control structure of AM. Finally, in
Section 3.5, the experimental results of the project are summarized.
3.1. Discovery in Mathematics as Heuristic Rule-Guided Search
The task which AM performs is the discovery of new mathematics concepts and
relationships between them. The simple paradigm it follows for this task is to
maintain a graph of partially-developed concepts, and to obey a large collection
of "heuristics" (rules which frequently lead to discoveries) which guide it to
define and study the most plausible thing next.
For example, at one point AM had some notions of sets, set-operations, numbers,
and simple arithmetic. One heuristic rule it knew said &i["If f is an
interesting relation, Then look at its inverse"]. This rule fired after AM had
studied "multiplication" for a while. The rhs of the rule then directed AM to
define and study the relation "divisors-of" (e.g., divisors-of(12) =
{1,2,3,4,6,12⎇). Another heuristic rule which later fired said &i["If f is a
relation from A into B, then it's worth examining those members of A which map
Lenat & Harris p. 7
into &u[extremal] members of B"]. In this case, f was matched to "divisors-of",
A was "numbers", B was "sets of numbers", and an extremal member of B might be,
e.g., a very &i[small] set of numbers. Thus this heuristic rule caused AM to
define the set of numbers with no divisors, the set of numbers with only 1
divisor, with only 2 divisors, etc. One of these sets (the last one mentioned)
turned out subsequently to be quite important; these numbers are of course the
primes. The above heuristic also directed AM to study numbers with very
&i[many] divisors; such highly-composite numbers were also found to be
interesting.
This same paradigm enabled AM to discover concepts which were much more
primitive (e.g., cardinality) and much more sophisticated (e.g., the fundamental
theorem of arithmetic) than prime numbers. We shall now describe the AM program
in more detail.
3.2. Representation of Mathematical Knowledge
What exactly does it mean for AM to "have the notion of" a concept? It means
that AM possesses a frame-like data structure for that concept. For instance,
here is how one concept looked after AM had defined and explored it:
Lenat & Harris p. 8
FIGURE 2: A Typical Concept
NAME: Prime Numbers
DEFINITIONS:
ORIGIN: Number-of-divisors-of(x) = 2
PREDICATE-CALCULUS: Prime(x) ≡ (Forall z)(z|x implies z=1 xor z=x)
ITERATIVE: (for x>1): For i from 2 to Sqrt(x), ¬(i|x)
EXAMPLES: 2, 3, 5, 7, 11, 13, 17
BOUNDARY: 2, 3
BOUNDARY-FAILURES: 0, 1
FAILURES: 12
GENERALIZATIONS: Numbers, Numbers with an even number of divisors,
Numbers with a prime number of divisors
SPECIALIZATIONS: Odd Primes, Prime Pairs, Prime Uniquely-addables
CONJECS: Unique factorization, Goldbach's conjecture, Extrema of Divisors-of
ANALOGIES:
Maximally-divisible numbers are converse extremes of Divisors-of
Factor a group into simple groups
INTEREST: Conjectures tying Primes to Times, to Divisors-of, to closely
related operations.
WORTH: 800
3.3. Top-level Control: An Agenda of Promising Questions
AM is initially given a collection of 115 core concepts, with only a few facets
(i.e., slots) filled in for each. AM repeatedly chooses some facet of some
concept, and tries to fill in some entries for that particular slot. To decide
which such job to work on next, AM maintains an &i[agenda] of jobs, a global
queue ordered by priority [2]. A typical job is &i["Fill-in examples of
Primes"]. The agenda may contain hundreds of entries such as this one. AM
repeatedly selects the top job from the agenda and tries to carry it out. This
is the whole control structure! Of course, we must still explain how AM creates
plausible new jobs to place on the agenda, how AM decides which job will be the
best one to execute next, and how it carries out a job.
If the job were &i["Fill in new Algorithms for Set-union"], then &i[satisfying]
Lenat & Harris p. 9
it would mean actually synthesizing some new procedures, some new LISP code
capable of forming the union of any two sets. A heuristic rule is &i[relevant]
to a job iff executing that rule brings AM closer to satisfying that job.
Potential relevance is determined &i[a priori] by where the rule is stored. A
rule tacked onto the Domain/range facet of the Compose concept would be presumed
potentially relevant to the job &i["Fill in the Domain of Insert-]o&i[-Delete"].
The lhs of each potentially relevant rule is evaluated to determine whether the
rule is truly relevant.
Once a job is chosen from the agenda, AM gathers together all the potentially
relevant heuristic rules -- the ones which might accomplish that job. They are
executed, and then AM picks a new job. While a rule is executing, three kinds
of actions or effects can occur:
(&i[i]) Facets of some concepts can get filled in (e.g., examples of primes may
actually be found and tacked onto the "Examples" facet of the "Primes"
concept). A typical heuristic rule which might have this effect is:
If examples of X are desired, where X is a kind of Y (for some more
general concept Y),
Then check the examples of Y; some of them may be examples of X as well.
For the job of filling in examples of Primes, this rule would have AM notice
that Primes is a kind of Number, and therefore look over all the known
examples of Number. Some of those would be primes, and would be transferred
to the Examples facet of Primes.
(&i[ii]) New concepts may be created (e.g., the concept "primes which are
uniquely representable as the sum of two other primes" may be somehow be
deemed worth studying). A typical heuristic rule which might result in this
new concept is:
If some (but not most) examples of X are also examples of Y (for some
concept Y),
Then create a new concept defined as the intersection of those 2
concepts (X and Y).
Suppose AM has already isolated the concept of being representable as the
Lenat & Harris p. 10
sum of two primes in only one way (AM actually calls such numbers "Uniquely-
prime-addable numbers"). When AM notices that some primes are in this set,
the above rule will create a brand new concept, defined as the set of
numbers which are both prime and uniquely prime addable.
(&i[iii]) New jobs may be added to the agenda (e.g., the current activity may
suggest that the following job is worth considering: "Generalize the concept
of prime numbers"). A typical heuristic rule which might have this effect
is:
If very few examples of X are found,
Then add the following job to the agenda: "Generalize the concept X".
The concept of an agenda is certainly not new: schedulers have been around for a
long time. But one important feature of AM's agenda scheme &i[is] a new idea:
attaching -- and using -- a list of quasi-symbolic reasons to each job which
explain why the job is worth considering, why it's plausible. It is the
responsibility of the heuristic rules to include reasons for any jobs they
propose. For example, let's reconsider the heuristic rule mentioned in (&i[iii])
above. It really looks more like the following:
If very few examples of X are found,
Then add the following job to the agenda: "Generalize the
concept X", for the following reason: "X's are quite
rare; a slightly less restrictive concept might be more
interesting".
If the same job is proposed by several rules, then several different reasons for
it may be present. In addition, one ephemeral reason also exists: "Focus of
attention" [9]. Any jobs which are related to the one last executed get "Focus
of attention" as a bonus reason. AM uses all these reasons to decide how to
rank the jobs on the agenda. Each reason is given a rating (by the heuristic
which proposed it), and the ratings are combined into an overall priority rating
for each job on the agenda. The jobs are ordered by these ratings, so it is
Lenat & Harris p. 11
trivial to select the job with the highest rating. Note that if a job already on
the agenda is re-proposed for a new reason, then its priority will increase. If
the job is re-proposed for an already-present reason, however, the overall
rating of the job will &i[not] increase. This turned out to be an important
enough phenomenon that it was presented in [13] as a necessary design
constraint.
AM uses each job's list of reasons in other ways. Once a job has been selected,
the quality of the reasons is used to decide how much time and space the job
will be permitted to absorb, before AM quits and moves on to a new job. Another
use is to explain to the human observer precisely why the chosen top job is a
plausible thing for AM to concentrate upon.
3.4. Low-level Control: A Lattice of Heuristic Rules
The hundreds of concepts AM possesses are interrelated in many ways. One main
organization is that provided by their Generalization and Specialization facets.
The concepts may be viewed as nodes on a large lattice whose edges are labelled
Genl/Spec. The importance of this organization stems from various
&i[heritability] properties. For example, Spec is transitive, so the
specializations of Numbers include not only Primes but all &i[its]
specializations as well.
Let us describe a second, very important heritability property. Each of the 250
heuristic rules is attached to the most general (or abstract) concept it is
deemed appropriate for. The relevance of heuristic rules is assumed to be
Lenat & Harris p. 12
inherited by all its specializations. For example, a heuristic method which is
capable of inverting any function will be attached to the concept "Function";
but it is certainly also capable of inverting any permutation. If there are no
known methods specific to the latter job, then AM will follow the Genl links
upward from Permutation to Bijection to Function..., seeking methods for
inversion. Of course the more general concepts' methods tend to be weaker than
those of the specific concepts.
In other words, the Genl/Spec graph of concepts induces a graph structure upon
the set of heuristic rules. This permits potentially relevant rules to be
located efficiently. Here is one more example of how this heritability works in
practice: Immediately after the job "Fill in examples of Set-equality" is
chosen, AM asks each generalization of Set-equality for help. Thus it asks for
ways to fill in examples of any Predicate, any Activity, any Concept, and
finally for ways to fill in examples of Anything. One such heuristic rule known
to the Activity concept says: &i["If examples of the domain of the activity f
are already known, Then actually execute f on some random members of its
domain."] Thus to fill in examples of Set-equality, its domain facet is
inspected, and AM sees it takes a pair of sets as its arguments. Then AM
accesses the Examples facet of the concept Set, where it finds a large list of
sets. The lhs is thus satisfied, so the rule is fired. Obeying the heuristic
rule, AM repeatedly picks a pair of the known sets at random, and sees if they
satisfy Set-equality (by actually running the LISP function stored in the
Algorithms facet of Set-equality). While this will typically return False, it
will occasionally locate -- by random chance -- a pair of equal sets.
Lenat & Harris p. 13
Other heuristics, tacked onto other generalizations of Set-equality, provide
additional methods for executing the job "Fill in examples of Set-equality." A
heuristic stored on the concept Any-concept says to symbolically instantiate the
definition. After spending much time manipulating the recursive definition of
Set-equality, a few trivial examples (like {⎇={⎇) are produced. Notice that (as
expected) the more general the concept is, the weaker (more time-consuming, less
chance for success) its heuristics tend to be. For this reason, AM consults
each concept's rules in order of increasing generalization.
3.5. Behavior of this Rule System
As the preceding four sections indicate, the dynamic behavior of AM was as
follows: a job is chosen from the agenda, potentially relevant rules are located
by their position in the Genl/Spec lattice, their lhs (left-hand sides) are
evaluated to find those which actually trigger, they are then executed (in order
of decreasing specificity) until they are all executed (or until some &i[local],
self-imposed limit on time or space is exceded), and the cycle repeats. AM has a
modest facility that prints out a description of these activities as they occur.
Here is a tiny excerpt of this self-trace monologue.
** &u[Job 65:] ** Fill in Examples of the concept "Divisors-of".
3 Reasons: (1) No known examples of Divisors-of so far.
(2) TIMES, which is related to Divisors-of, is now very interesting.
(3) Focus of attention: AM recently defined Divisors-of.
26 examples found, in 9.2 seconds. e.g., Divisors-of(6)={1 2 3 6⎇.
** &u[Job 66:] ** Consider numbers having small sets of Divisors-of.
2 Reasons: (1) Worthwhile to look for extreme cases.
(2) Focus of attention: AM recently worked on Divisors-of.
Lenat & Harris p. 14
Filling in examples of numbers with 0 divisors.
0 examples found, in 4.0 seconds.
Conjecture: no numbers have precisely 0 divisors.
Filling in examples of numbers with 1 divisors.
1 examples found, in 4.0 seconds. e.g., Divisors-of(1) = {1⎇.
Conjecture: 1 is the only number with precisely 1 divisor.
Filling in examples of numbers with 2 divisors.
24 examples found, in 4.0 seconds. e.g., Divisors-of(13) = {1 13⎇.
No obvious conjecture. May merit more study.
Creating a new concept: "Numbers-with-2-divisors".
Filling in examples of numbers with 3 divisors.
11 examples found, in 4.0 seconds. e.g., Divisors-of(49) = {1 7 49⎇.
All numbers with 3 divisors are also Squares. Definitely merits more study.
Creating a new concept: "Numbers-with-3-divisors".
** &u[Job 67:] ** Consider the square-roots of Numbers-with-3-divisors.
2 Reasons: (1) Numbers-with-3-divisors are unexpectedly also Perfect Squares.
(2) Focus of attention: AM recently worked on Nos-with-3-divisors.
All square-roots of Numbers-with-3-divisors seem to be Numbers-with-2-divisors.
e.g., Divisors-of(Square-root(169)) = Divisors-of(13) = {1 13⎇.
Even the converse of this seems empirically to be true.
i.e., the square of each No-with-2-divisors seems to be a No-with-3-divs.
The chance of coincidence is below acceptable limits.
Boosting the interestingness rating of each of the concepts involved.
** &u[Job 68:] ** Consider the squares of Numbers-with-3-divisors.
3 Reasons: (1) Squares of Numbers-with-2-divisors were interesting.
(2) Square-roots of Numbers-with-3-divisors were interesting.
(3) Focus of attention: AM recently worked on Nos-with-3-divisors.
Now that we've seen how AM works, and we've been exposed to a bit of "local"
results, let's take a moment to discuss the totality of the mathematics which AM
carried out. AM began its investigations with scanty knowledge of a hundred
elementary concepts of finite set theory. Most of the obvious set-theoretic
Lenat & Harris p. 15
concepts and relationships were quickly found (e.g., de Morgan's laws;
singletons), but no sophisticated set theory was ever done (e.g.,
diagonalization). Rather, AM discovered natural numbers and went off exploring
elementary number theory. Arithmetic operations were soon found (as analogs to
set-theoretic operations), and AM made surprising progress in divisibility
theory. Prime pairs, Diophantine equations, the unique factorization of numbers
into primes, Goldbach's conjecture -- these were some of the nice discoveries by
AM. Many concepts which we know to be crucial were never uncovered, however:
[2]
remainder , gcd, infinity, proof, greater-than, etc.
All the discoveries mentioned were made in a run lasting one cpu hour
(Interlisp+100k, SUMEX PDP-10 KI). Two hundred jobs in toto were selected from
the agenda and executed. On the average, a job was granted 30 cpu seconds, but
actually used only 18 seconds. For a typical job, about 35 rules were located
as potentially relevant, and about a dozen actually fired. AM began with 115
concepts and ended up with three times that many. Of the synthesized concepts,
half were technically termed "losers" (both by the author and by AM), and half
the remaining ones were of only marginal interest.
Although AM fared well according to several different measures of performance
(see Section 7.1 in [13]), of greatest significance are its &i[limitations].
This subsection will merely report them, and the next section will analyze
---------------
2
This concept, and many of the other "omissions", &u[could] have been
discovered by the existing heuristic rules in AM. The paths which would have
resulted in their definition were simply never rated high enough to explore.
Lenat & Harris p. 16
whether they were caused by radical departures from the naive production-system
architecture, or from departing not far enough from that early design.
As AM ran longer and longer, the concepts it defined were further and further
from the primitives it began with. Thus "prime-pairs" were defined using
"primes" and "addition", the former of which was defined from "divisors-of",
which in turn came from "multiplication", which arose from "addition", which was
defined as a restriction of "union", which (finally!) was a primitive concept
(with heuristics) that we had supplied to AM initially. When AM subsequently
needed help with prime pairs, it was forced to rely on rules of thumb supplied
originally about &i[union]ing. Although the heritability property of heuristics
did ensure that those rules were still valid, the trouble was that they were too
general, too weak to deal effectively with the specialized notions of primes and
arithmetic. For instance, one general rule indicated that A union B would be
interesting if it possessed properties absent both from A and from B. This
translated into the prime-pair case as "&i[If p+q=r, and p,q,r are primes, Then
r is interesting if it has properties not possessed by p or by q.]" The search
for categories of such interesting primes &i[r] was of course barren. It showed
a fundamental lack of understanding about numbers, addition, odd/even-ness, and
primes.
As the derived concepts moved further away from finite set theory, the efficacy
of the initial heuristics decreased. AM began to "thrash", appearing to lose
most of its heuristic guidance. It worked on concepts like "prime triples",
which is not a rational thing to investigate. The key deficiency was that of
Lenat & Harris p. 17
adequate &i[meta]-rules[6]: heuristics which cause the creation and modification
of new heuristics.
Aside from the preceding major limitation, most of the other problems pertain to
missing knowledge. Many concepts one might consider basic to discovery in math
are absent from AM; analogies were under-utilized; physical intuition was
absent; the interface to the user was far from ideal; etc.
4. Reexamining the Design
Let us now consider the major components of a RS's design and how AM treated
them: the DS, the rules, the distribution of knowledge between DS and rules, and
the rule interpretation policy. For each component, AM's architecture failed to
adhere strictly to the pure RS guidelines. Were these departures worth the loss
of simplicity? Were the deviations due to the task domain (scientific
discovery), to the task view (heuristically guided growth of structured
theories), or to other sources? These are the kinds of questions we shall
address in each of the following subsections.
4.1. Data Structures
We recognize that a single uniform DS (e.g., an infinite STM [19]) is universal
in the Turing sense of being &i[formally] adequate: One can encode any
representation in a linear, homogeneous DS. The completeness of such a DS
design not withstanding, we believe that encouraging several distinct, special-
purpose DSs will enhance the performance of a discovery system. That is, we are
willing to sacrifice aesthetic purity of DSs for clarity, efficiency, and power.
In this section we will explore this tradeoff.
Lenat & Harris p. 18
The data structures used in AM are unlike the uniform memories suggested by the
first design constraint (see Fig. 1). One DS -- the agenda -- holds an ordered
list of plausible questions for the system to concentrate on, a list of jobs to
work on. Another DS is the graph of concepts AM knows about. Each concept
itself consists in much structured information (see Fig. 2). Still other
information is present as values of certain functions and global variables: the
cpu clock, the total number of concepts, the last thing typed out to the user,
etc. All these types of information are accessed by the lhs's (left hand sides)
of heuristic rules, and affected by rhs's.
Why is there this multitude of diverse DSs? Each type of knowledge (jobs, math
knowledge, system status) needs to be treated quite differently. Since the
primitive operations will vary with the type of information, so should the DS.
For &u[jobs], the primitive kinds of accesses will be: picking the highest-rated
one, reordering one, deleting the lowest-rated one, merging new entries. A
natural choice to make these operations efficient is to keep the system's goals
in a queue ordered by their rating or partially-ordered by those ratings that
are commensurable. For &u[resource information], like the amount of cpu time
used so far, it is much more natural to "read" an implementation-dependent
measure of time or space used or number of concepts created than to periodically
add notes ("cpu time is now 37.4") to a uniform memory. For &u[math concepts],
we have a much less volatile situation. We view them as an ever-growing body of
highly-interrelated facts. Knowledge in this form is stable and rarely deleted.
When new knowledge is added, a great many "routine" inferences must be drawn. In
a uniform, linear memory, each would have to be drawn explicitly; in a
Lenat & Harris p. 19
structured one (as the Genl/Spec graph structure provides) they may be
accomplished through the tacit (analogical) characteristics of the
representation, simply by deciding &i[where] to place the information.
Each kind of knowledge dictates a set of appropriate kinds of primitive
operations to be performed on it, which in turn suggest natural data structures
in which to realize it. The generality of this perspective on rule-based systems
is made more plausible by examining other RSs which deal with many types of
knowledge (e.g., [5]). If this is so, if the design procedes from `knowledge to
be represented' to `a data structure to hold it', then fixing the capabilities
of DS access primitives &i[a priori] is suspect.
Therefore, we advocate the opposite: the RS designer is encouraged to name every
combination of "machine" operations that together comprise a single conceptual
access of data by rules. In AM, it is quite reasonable to expect that a request
like "find all generalizations of a given concept" would be such a primitive
(i.e., could be referred to by name). Even though it might cause the "machine"
(in this case, LISP) to run around the Genl/Spec graph, a single rule can treat
this as merely an "access" operation. The use of complex tests and actions is
not new; we simply claim that it is &i[always] preferable to package knowledge
(for which a reasonably fast algorithm is available) as a single action (though
it may have side-effects in the space of concepts) or a single test (so long as
its sole side-effect -- modulo caches -- is to signal). Primitive tests and
actions should be maximally algorithmic.
The naive view of designing a production rule system was that of defining a
Lenat & Harris p. 20
machine. Our present view is that RSs do not &i[compute] so much as they
&i[guide attention]. In adopting this view (thereby separating the controller
from the effector), we recognize that we are giving up an attractive feature of
pure rule systems: a homogeneous basis for definition. For example, the rule
system designer must now spell out in detail the definitions of the DS accessing
functions; but the naive RS designer was simply able to take as &i[givens] the
matching and inserting operations (as specified in naive priniciple #6, Fig. 1),
[3]
and he built any more complicated ones out of these primitives . In giving
up the old view of the RS as an abstract computing machine, the RS designer must
use another homogeneous substrate (e.g., LISP) in terms of which to define his
DSs and especially the procedures that process them.
We have seen that admitting complicated and varied DSs leads to stylized sets of
DS accesses. The DSs and their sets of read/write primitives must in turn be
explicitly defined (coded) by the designer. This seems like a high price to pay.
Is there any bright side to this? Yes, one rather interesting possibility is
opened up. Not only the RS designer, but the RS &i[itself] may define DSs and DS
access functions. In AM, this might take the form of dynamically defining new
kinds of facets. E.g., after "1-1 Function" is defined, and after some
properties of it have been discovered, it would be appropriate to introduce a
new facet called "inverse" for each (concept representing a) 1-1 function. In
AM, the actual definitions of the facets of every concept are complex enough
---------------
3
Either by stringing out a sequence of primitives on one side of a rule, or by
handcrafting a tightly coupled bundle of rules (so firing such a rule would
simulate traversing one link of the kind that abound in AM's DSs).
Lenat & Harris p. 21
(shared structure), inter-related enough (shared meaning), and interesting
enough (consistent heuristic worth) that a special concept was included for each
one (e.g., a concept called "Examples") which contained a definition,
description,... of the facet. Thus the same techniques for manipulating and
discovering math concepts operate on DS design concepts. Not only do math
theories emerge, so can new DS access functions (new facets; e.g., "Small
Boundary Examples").
It should be noted that in opting for non-uniform DSs, we have not necessarily
sacrificed efficiency. One need only to compare the time to access a node in a
tree, versus in a linear list, to appreciate this fact.
Just how tangled up a DS should we tolerate? Should memory elements be
permitted to refer to (to "know about") each other? We believe the answer to
depend upon the &i[type] of data structure involved. For the homogeneous DS
called for in the naive design, much simplicity is preserved by forbidding this
kind of interrelationship. But consider a DS like AM's graph of concepts. It is
growing, analogically interrelated, and it contains descriptions of its
elements. This richness (and sheer quantity) of information can be coded only
inefficiently in a uniform, non-self-referential manner. For another example,
consider AM's agenda of jobs. One reason for a job may simply be the existence
of some other job. In such a case, it seems natural for part of one entry on the
agenda (a reason part of one job) to point to another entry in the same DS
(point to another specific job on the agenda). Thus, inter-element pointers
&i[are] allowed, even though they blur a "pure" distinction between a DS and its
Lenat & Harris p. 22
[4]
entries. They play a necessary role in organizing large bodies of highly
interrelated information into structured modules.
There is yet another motivation for special-purpose DSs when the task of the RS
includes sensing an external environment. Using a uniform memory, external
stimuli are dumped into the working memory and rub shoulders with all the other
data. They must then be distinguished from the others. ("Must" because to
freely intermingle what one sees or is told with what one thinks or remembers is
to give way to endless confusion.) How much cleaner, less distracting, and safer
it is for stimuli to arive in their own special place -- a place which might
well be a special purpose store such as an intensity &i[array] (not even a list
structure at all), or a low-level speech-segment memory. A linear memory (e.g.,
an infinite STM) is of course adequate; one could tag each incoming
environmental stimulus with a special flag. But the design philosophy we are
proposing is aimed at maximizing clarity and efficiency, not uniformity or
universality.
We know that this view of DSs means making a specialized design effort for each
class of knowledge incorporated into the RS. But that is desirable, as it buys
us performance, ease of coding, and ease of learning.
4.2. Rules
In the "pure" view of RSs, the rule store is not a full-fledged DS of the RS.
---------------
4
In section 4.3 we will mention work that blurs this distinction even further.
Lenat & Harris p. 23
For example, in Waterman's [23] poker player, rules may not be deleted.
Rychener [22] states that the only way his RS may inspect rules is by examining
the effect of those rules which have recently fired. Although AM had no
explicit taboo against inspecting rules, such analyses were in practice never
possible, since the rules were &i[ad hoc] blocks of LISP code. This eventually
turned out to be the main limitation of the design of AM. The ultimate
impediment to further discovery was the lack of rules which could reason about,
modify, delete, and synthesize other rules. AM direly needed to synthesize
specialized forms of the given general heuristic rules (as new concepts arose;
see the end of 3.5.)
We want our heuristic rules to be added, kept track of, reasoned about,
modified, deleted, generalized, specialized,... whenever there is a good reason
to do so. Note that those situations may be very different from the ones in
which such a rule might fire. E.g., upon discovering a new, interesting
concept, AM should try to create some specially-tailored heuristic rules for it.
They wouldn't actually &i[fire] until much later, when their lhs's were
triggered. After having constructed such rules, AM might subject them to
improvement as it explores the new concept.
In sum, we have found that the discovery of heuristic rules for using new math
concepts is a necessary part of the growth of math knowledge. Hence, following
the argument in 4.1, the rules themselves should be DSs, and each rule might be
described by a concept with effective (executable) and non-effective (purely
descriptive) facets. This lesson was made all the more painful because it was
Lenat & Harris p. 24
not new [5]. Apparently the need for reasoning about rules is common to many
tasks.
The current re-coding of AM does in fact have each rule represented as a
concept. What kinds of non-effective "facets" do they have? Recall that one of
the features of the original AM (as described in Section 3.3) was that with each
rule were associated some symbolic &i[reasons] which it could provide whenever
it proposed a new job for the agenda. So one kind of facet which every rule can
possess is "Reasons". What others are there? Some of them &i[describe] the rule
(e.g., its average cost); some facets provide a road map to the space of rules
(e.g., which rule schemata are mere specializations of the given one); some
facets record its derivation (e.g., the rule was proposed as an analog to rule X
because...), its redundancy, etc.
There are some far-reaching consequences of our new awareness of the need to
reason about rules just as if they were any other concepts known to AM. When
one piece of knowledge relates to several rules, then one general concept, a
rule &i[schema], should exist to hold that common knowledge. Since each rule is
a concept, there will be a natural urge to exploit the same Genl/Spec
organization that proved so useful before. Heritability still holds; e.g., any
reason which explains rule R is also somehow a partial explanation of each
specialization of R. Rule schemata have cause to exist simply because they
generalize -- and hold much information which would otherwise have to be
duplicated in -- several specific rules. They may tend to be "big" and less
directly productive when executed, yet they are of value in capturing the
Lenat & Harris p. 25
[5]
essence of the discovery techniques. We put "big" in quotes because sheer
length (total number of lhs tests allowed, total number of rhs actions) is not
directly what we're talking about here. A general rule schema will capture many
regularities, will express an idea common to several more specific rules. It
will contain dual forms of the same rule, sophisticated types of variable-
binding (for the duration of the rule application), and searching may even be
required to find the actions of such a general rule. We may even wish to
consider every rule in the RS as a rule schema of some level of generality, and
much processing may go on to find the particular instance(s) of it which should
be applied in any particular situation.
One use of a rule schema might be to &i[name] a collection of rules that are
coupled by together fulfilling a single function. Consider a collection of
rules which is tightly coupled, say to perform an iteration. Much knowledge
about an iteration loop as a whole may exist. Where is such descriptive
information to be stored and sought? There must be at least one rule-like
concept which "knows about" the iteration as one coherent unit. We conclude
that even if two such intertwined rules are kept separate, a third rule (a
schema) should exist which (at least implicitly) has a rhs which combines them.
Thus rule schemata do more than just unify general properties of rules; there
must also be schemata of the kind that relate function to mechanism.
---------------
5
In AM, even the specific rules may be "big" in the sense that their very
precise knowledge may involve much testing to trigger and, once triggered, may
conclude some elaborate results.
Lenat & Harris p. 26
Another problem crops up if we consider what happens if one of the coupled rules
is modified. Often, some corresponding change should be made in all its
companions. For example, if a term is generalized (replacement of "prime" by
"number" everywhere) then the same substitution had probably better be done in
each rule which this one is supposed to couple with. What we are saying is
that, for RSs which modify their own rules, it can be dangerous to split up a
single conceptual process into a bunch of rules which interact in more or less
fixed ways when run, without continuing to reason about them as an integrity,
&i[like any other algorithm] composed of parts. Here again, we find pressure to
treat RSs as algorithms, not vice-versa.
Finally, let us make a few irresistable observations. The whole notion of
coupling via meaningless tokens is aesthetically repugnant and quite contrary to
"pure" production system spirit. At the least, when a coupled rule deposits some
"intermediate-state" message in a DS, one would like that message to be
meaningful to many rules in the system, to have some significance itself. We can
see that entries in a DS have an expected meaning to the read access functions
[6]
that examine the DS. If this purity is maintained, then any apparent
"coupling" would be merely superficial: each rule could stand alone as a whole
domain-dependent heuristic. Thus no harm should come from changing a single
rule, and more rules could be added that act on the "intermediate message" of
the coupling. Such meaningful, dynamic couplings should be encouraged. Only the
meaningless, tight couplings are criticized.
---------------
6
Perhaps this "meaning" could even be expressed formally as an invariant which
the write access functions for the DS must never violate.
Lenat & Harris p. 27
4.3. Distribution of Knowledge Between Rules and DS
A common "pure" idea is that all knowledge of substance ought to be represented
as rules. Independent of such rules, the DS forms no meaningful whole
initially, nor has it any final interpretation. The "answer" which the RS
computes is not stored in the DS; rather, the answer consists in the process of
[7]
rule firings. The DS is "just" an intermediate vehicle of control
information.
Contrary to this, we say that rules ought to have a &i[ symbiotic] relationship
to DSs. The DSs hold meaningful domain-dependent information, and rules process
knowledge represented in them. For RSs designed to perform scientific research,
the DSs contain the theory, and the rules contain methods of theory formation.
But much domain-dependent knowledge is contingent (conditional). E.g., "If n
and m divide x, then so must nm". Shouldn't such If/Then information be encoded
as rules? We answer an emphatic &i[No]. Just as there is a distribution of "all
knowledge of substance" between rules and DSs, so too must the contingent
information be partitioned between them. We shall illustrate two particular
issues: (i) Much contingent knowledge can be stored implicitly in DSs; (ii) Some
conditional knowledge is inappropriate to store as rules.
When designing a DS, it is possible to provide mechanisms for holding a vast
amount of information &i[implicitly]. In AM, e.g., the organization of concepts
---------------
7
The sequence of actions in time. In addition, perhaps, the "answer" may
involve a few of their side-effects. E.g., (Respond `YES').
Lenat & Harris p. 28
into a Genl/Spec hierarchy (plus the assumed heritability properties; see 3.4)
permits a rule to ask for "all concepts more general than Primes" as if that
were a piece of data explicitly stored in a DS. In fact, only direct
generalizations are stored ("The immediate generalizations of Primes is
Numbers"), and a "rippling" mechanism automatically runs up the Genl links to
assemble a complete answer. Thus the number of specific answers the DS can
provide is far greater than the number of individual items in the DS. True,
these DS mechansims will use up extra time in processing to obtain the answer;
but notice that any particular request is very unlikely to be made. Just as each
rule knows about a general situation, of which it will only see a few instances,
that same quality (a wide potential applicability) is just as valuable for
knowledge in DSs. These are situations where, like Dijkstra's multiplier [8],
the mechanism must provide any of the consequences of its knowledge quickly on
demand, but in its lifetime will only be asked a few of them.
Now that we have seen how contingent knowledge &i[can] be encoded into DSs, let
us see some cases where it &i[should] be -- i.e., where it is not approprate to
encode it as rules of the system. Many things get called implication, and only
some of them correspond to rule application. For instance, there is logical
entailment (e.g., if A and B then A), physical causation (e.g., if itrains, then
the ground will get wet), probable associations (e.g., if it is wet underfoot,
then it has probably been raining.) These all describe the way the world is, not
the way the perceiver of the world behaves. Contrast them with knowledge of the
form "If it is raining, then open the umbrella". We claim that the latter kind
of situation-action relationships should be encoded as rules for the RS, but
Lenat & Harris p. 29
that the other types of implication should be stored declaratively within the
DS. Let's try to justify this distinction.
The situation-action rules indicate how to behave in the world; the other types
of implication merely indicate expected relationships within the world. The
rules of a RS are meant to encode potential procedural actions which are obeyed
by the system, while the DSs encode the way the world (the RSs environment)
behaves in terms of some model of it. The essential thing to consider is what
relations we want to obtain &i[in time]; these are the things we should cast as
rules. The lhs of a rule measures some aspect of knowledge presently in DSs,
while the rhs of the rule defines the attention of the system (regarded as a
processor feeding off of the DS) in the immediate future. This is the heart of
why rule-sets are algorithms. They are algorithms for guiding the application
of other (DS processing) algorithms. It also explains why other kinds of
implications are unsuitable to be rules. Consider causal implication ("Raining
--> Wet"). While the lhs could be a rule's lhs (it measures an aspect of any
situation), the rhs should &i[not] be a rule's rhs (it does not indicate an
appropriate action for the system to take). Most purist production systems have
(often implicitly!) a rule of the form "If the left side of an implication is
true in the database, Then assert the right side". This is only one kind of
rule, of course, capable of dealing with implications. For example, MYCIN and
LT [17] (implicitly) follow a very different rule: "If the rhs of an implication
will satisfy my goal, Then the lhs of the implication is now the new goal".
Other rules are possible. The point is that the implications themselves are
declarative knowledge, not rules. In summary, then, it may be very important to
Lenat & Harris p. 30
distinguish rules (attention guides) from mere implications (access guides), and
to store the latter within the DSs. This policy was not motivated by the
scientific inference task for our RS. We believe it to be a worthwhile guideline
in the design of &i[any] RS.
4.4. Interpreter
After a rule fires, the naive interpretation policy (#9 in Fig. 1) demands that
&i[any] rule in the system can potentially be the next one selected to fire.
This is true regardless of the speed-up techniques used in any particular
inplementation (say, by preprocessing the lhs's into a discrimination net [22]).
But consider RSs for scientific discovery tasks. Their task -- both at the top
level and frequently at lower levels -- is quite open-ended. If twenty rules
trigger as relevant to such an open-ended activity (e.g., gathering empirical
data, inducing conjectures, etc.) then there is much motivation for continuing
to execute just these twenty rules for a while. They form an &i[ad hoc]
plausible search algorithm for the agenda item selected.
A RS for discovery might reasonably be given a complex interpreter (rule-firing
policy). AM, for example, experimented with a two-pass interpreter: first, a
best-first, agenda-driven resource allocator and attention focusser selects the
job it finds most interesting; second, it locates the set of relevant rules
(usually about 35) for the job, and begins executing them one after another (in
best-first order of specificity) until the resources allocated in the first step
run out [20]. The overall rating of the job which these rules should satisfy
determines the amount of cpu time and list cells that may be used up before the
rules are interrupted and job is abandoned.
Lenat & Harris p. 31
For example, say the job were "Find examples of Primes". It's allotted 35 cpu
seconds and 300 list cells, due to its overall priority rating just before it
was plucked from the agenda. 23 rules are relevant. The first one quickly finds
that "2" and "3" are primes. Should the job halt right then? No, not if the real
reason for this job is to gather as much data as possible, data from which
conjectures will be suggested and tested. In that case, many of the other 23
rules should be fired as well. They will produce not only &i[additional]
examples, but perhaps other &i[types] of examples.
The jobs on AM's agenda are really just mini-research questions which are
plausible to spend time investigating. Although phrased as specific requests,
each one is really a research proposal, a topic to concentrate upon. We found it
necessary to deviate from the simplest uniform interpreter for clarity (for
instance, a human can understand each state of the first-pass standing alone,)
for efficiency (knowing that all 35 are relevant, there is no need to find them
35 times,) and for power (applying qualitatively different kinds of rules yields
various types of examples.) We claim this quality of open-endedness will recur
in any RS whose task is free concept exploration. This includes all scientific
discovery but not all scientific inference.
5. Speculations for a New Discovery System
The spirit of this paper has been to give up straightforward simplicity for
clarity, efficiency, and power. Several examples have been cited, but we
speculate that there are further tradeoffs of this kind which are applicable to
RSs that make discoveries.
Lenat & Harris p. 32
Often, there are several possible ways the designer may view the task of (and
subtasks of) the intended RS. We wish to add the notion of "proof" to AM, say.
Should we represent proof as a resolution search, as a process of criticism and
improvement [11] spiralling toward a solution, as a natural deduction
cascade,...? Although any one of these task-views might perform respectably, we
advocate the incorporation of all of them, despite the concommitant costs of
added processing time, space, and interfacing. In fact, we wish never to
exclude the possibility of the system acquiring another task-view.
We look for the development of further discovery tools in the form of domain-
independent meta-heuristics that synthesize heuristic rules, and in the form of
terribly abstract heuristic schemata that specialize into newly-discovered
domains. These discovery tools are all part of "getting familiar" with
shallowly understood concepts, such as synthesized ones tend to be initially.
It may even be that symbolic analogy techniques exist, cutting across the
traditional boundaries of knowledge domains.
We contemplate a system that keeps track of (and has methods with which it
attempts to improve) the design of its own DSs, its own control structure, and
perhaps even its own design constraints. Although working in (a collection of)
specific domains, this would be a general &i[symbol system discoverer], capable
of picking up and exploring formulations, testing them and improving them.
5.1. A New Set of Design Constraints
Below are 12 principles for designing a RS whose task is that of scientific
Lenat & Harris p. 33
exploration. They are the result of reconsidering the original principles (Fig.
1) in the light shed by work on AM. Most of the "pure" principles we mentioned
in Fig. 1 are changed, and a few new ones have emerged.
FIGURE 3: Scientific Research RS Architecture
[1. ]Principle of Several Appropriate Memories. For each type of
knowledge which must be dealt with in its own way, a separate DS
should be maintained. The precise nature of each DS should be
chosen so as to facilitate the access (read/write) operations
which will be most commonly requested of it.
[2. ]Principle of Maximal DS Accesses. The set of primitive DS
access operations (i.e., the read tests which a rule's lhs may
perform, and the write actions which a rhs may call for) are
chosen to include the largest packages (clusters, chunks,...) of
activity which are commonly needed and which can be performed
efficiently on the DS.
[3. ]Principle of Facetted DS Elements. For ever-growing data
structures, there is much to be gained and little lost by
permitting parts of one DS item to point to other DS items.
[4. ]Principle of Rules as Data. The view which the RS designer
takes of the system's task may require that some rules be capable
of reasoning about the rules in the RS (adding new ones, deleting
old ones, keeping track of rules' performance, modifying existing
rules,...). Some of the methods the RS uses to deal with
scientific knowledge may be applicable to dealing with rules as
well. In such cases, the system's rules may thus be naturally
represented as new entries in the existing DS which holds the
scientific theory.
[5. ]Principle of Regular Rules. Each rule is actually a rule
&u[schema]. Sophisticated processing may be needed both to
determine which instance(s) are relevant and to find the precise
sequence of actions to be executed. Such schemata are often quite
long.
[6. ]Principle of Avoiding Meaninglessly-Coupled Rules. Passing
special-purpose loop control notes back and forth is contrary to
both the spirit of pure RSs and to efficiency. If rules are to
behave as coupled, the least we demand is that the notes they
write and read for each other be meaningful entries in DS (several
other rules can interpret the same note, and other rules might
have written it).
Lenat & Harris p. 34
[7. ]Principle of Controlled Environment. For many tasks, it is
detrimental to permit external stimuli (from an environment) to
enter any DS at random. At the least, the RS should be able to
distinguish these alien inputs from internally-generated DS
entries.
[8. ]Principle of Contingent Knowledge. In designing the DS, much
knowledge may be stored &u[implicitly]; e.g., by where facts are
placed in an analogical (e.g., hierarchical) network. The DS
should be designed so as to maximize this kind of concentrated
information storage. Hence, hard-working access functions are
needed to encode and decode the full meaning of DSs.
[9. ]Principle of Rules as Attention Guides. Knowledge should be
encoded as rules when it is intended to serve as a guide of the
system's attention; to direct its behavior. Other kinds of
information, even if stated in conditional form, should be
relegated to DSs (either explicitly as entries, or implictly as
special access functions).
[10. ]Principle of Inertial Interpretation. In tasks like
scientific research, where relevant rules will be performing
inherently open-ended activities (e.g., data-gathering), such
rules should be allowed to continue for a while even after they
have nominally carried out the activity (e.g., gathered one piece
of data). In such cases, the occasional wasted time and space is
more than compensated for by the frequent acquisition of valuable
knowledge.
[11. ]Principle of Openness. A rule system can be enriched by
incorporating into its design several independent views of the
knowledge it handles.
[12. ]Principle of Support of Discovery by Design. By
representing its own design explicitly, the RS could study and
improve those concepts, thereby improving itself. This includes
[8]
the DS design , the access function algorithms, how to couple
them, the function of various rules, the interpretation policy of
the RS, etc.
Rule systems whose designs adhere to these guidelines will be large and impure.
---------------
8
i.e., the facet specifications. If the input/output requirements change with
time, so should the rule system's data structures.
Lenat & Harris p. 35
We have mentioned thoughout the paper several new complications which the
principles introduce. Trying to produce such a RS for which a pure production
rule system was appropriate will probably result in disaster. Nevertheless,
empirical evidence suggests that RSs having this architecture are quite natural
-- and relatively tractable to contruct -- for open-ended tasks like scientific
discovery.
&i[ACKNOWLEDGEMENT]
This research builds upon Lenat's Ph.D. thesis at Stanford University, and he
wishes to deeply thank his advisers and committee members: Bruce Buchanan,
Edward Feigenbaum, Cordell Green, Donald Knuth, and Allen Newell. In addition,
he gladly acknowledges the ideas he received in discussions with Dan Bobrow,
Avra Cohn, and Randy Davis. Similarly, ideas received by Harris in two long and
fruitful associations, with John Seely Brown and with Roger Schank, have
contributed to this work. Many of our ideas have evolved through discussions at
CMU this past year, notably with Don Cohen, John McDermott, Kamesh Ramakrishna,
Paul Rosenbloom, James Saxe, and especially Herbert Simon.
&i[REFERENCES]
[0] Bledsoe, W. W., and Tyson, M., &i[The UT Interactive Prover], University
of Texas at Austin, Depts. of Mathematics and Computer Sciences Automatic
Theorem Proving Project Report No. 17, May, 1975.
[1] Bobrow, D., "Natural Language Input for a Computer Problem Solving
System", in (Minsky, M., editor), &i[Semantic Information Processing], The
MIT Press, Cambridge, Massachusetts, 1968.
[2] Bobrow, D., and Winograd, T., &i[An Overview of KRL, A Knowledge
Representation Language], Journal of Cognitive Science, Vol 1, No 1,
January 1977.
[3] Bobrow, R., and Brown, J. S., "Systematic Understanding, in (Bobrow, D.,
and Collins, A., eds.), &i[Representation and Understanding], Academic
Press, S.F., 1975.
[4] Buchanan, Bruce G., G. Sutherland, and E. Feigenbaum, &i[Heuristic
Dendral: A Program for Generating Explanatory Hypotheses in Organic
Chemistry], in (Meltzer and Michie, eds.) Machine Intelligence 4, American
Elsevier Pub., N.Y., 1969, pp. 209-254.
[5] Buchanan, Bruce G., &i[Applications of Artificial Intelligence to
Scientific Reasoning], Second USA-Japan Computer Conference, Tokyo, August
26-28. Published by AFIPS and IPSJ, Tokyo, 1975, pp. 189-194.
Lenat & Harris p. 36
[6] Davis, Randall, &i[Applications of Meta Level Knowledge to the
Construction, Maintenance, and Use of Large Knowledge Bases], SAIL AIM-
271, Artificial Intelligence Laboratory, Stanford University, July, 1976.
[7] Davis, R., and King, J., &i[An overview of production systems], Report
STAN-CS-75-524, Memo AIM-271, Stanford U. CS Department, 1975.
[8] Dijkstra, E. W., "Notes on Structured Programming", in Dahl, Dijkstra, and
Hoare, &i[Structured Programming], Academic Press, London, 1972, pp. 1-82.
[9] Hayes-Roth, Frederick, and Victor R. Lesser, &i[Focus of Attention in a
Distributed Speech Understanding System], Computer Science Dept. Memo ,
Carnegie Mellon University, Pittsburgh, Pa., January 12, 1976.
[10] Hewitt, Carl, &i[Viewing Control Structures as Patterns of Passing
Messages], MIT AI Lab Working Paper 92, April, 1976.
[11] Lakatos, Imre, &i[Proofs and Refutations], Cambridge U. Press, 1976.
[12] Lenat, D., &i[BEINGs: Knowledge as Interacting Experts], 4th IJCAI,
Tbilisi, Georgian SSR, USSR, 1975.
[13] Lenat, D., &i[AM: An Artificial Intelligence Approach to Discovery in
Mathematics as Heuristic Search], SAIL AIM-286, Artificial Intelligence
Laboratory, Stanford University, July, 1976. Jointly issued as Computer
Science Dept. Report No. STAN-CS-76-570.
[14] McCracken, Don, &i[A Parallel Production System Architecture for Speech
Understanding], CMU CS Dept. Ph.D. Thesis, 1977.
[15] Minsky, Marvin, "A Framework for Representing Knowledge", in (Winston,
P., ed.), &i[The Psychology of Computer Vision], McGraw Hill, N.Y. 1975,
pp. 211-277.
[16] Moran, T.P., &i[The symbolic imagery hypothesis: An empirical
investigation via a production system simulation of human behavior in a
visualization task], CMU CS Dept. Thesis, 1973. See also 3IJCAI, pp. 472-
477.
[17] Newell, Allen, J. Shaw, and H. Simon, &i[Empirical Explorations of the
Logic Theory Machine: A Case Study in Heuristics], RAND Corp. Report P-
951, March, 1957.
[18] Newell, Allen, and Simon, Herbert, &i[Human Problem Solving], Prentice-
Hall, Englewood Cliffs, New Jersey, 1972.
[19] Newell, A., &i[Production Systems: Models of Control Structures], May,
1973 CMU Report, also published in (W.G. Chase, ed.) &i[Visual
Information Processing], NY: Academic Press, Chapter 10, pp. 463-526.
[20] Norman, D., and D. Bobrow, &i[On Data-limited and Resource-limited
Processes], Journal of Cognitive Psychology, Volume 7, 1975, pp. 44-64.
[21] Polya, George, &i[Mathematics and Plausible Reasoning], Princeton
University Press, Princeton, Vol. 1, 1954; Vol. 2, 1954.
[22] Rychener, M. D., &i[Production systems as a programming language for
artificial intelligence applications]. Pittsburgh, Pa: Carnegie-Mellon
University, Department of Computer Science. 1976.
[23] Shortliffe, E. H., &i[MYCIN -- A rule-based computer program for advising
physicians regarding antimicrobial therapy selection], Stanford AI Memo
251, October, 1974.
[24] Waterman, D. A., &i[Adaptive Production Systems], CIP Working Paper 285,
CMU Psychology Dept., 1974. See also 4IJCAI, pp. 296-303.